Written by Steve Scherrer - July/August 2021

Background

This notebook documents preliminary analysis of tracking data for fish tagged in Molokini Crater between 2020-05-16 and 2021-05-24.

The purpose of this study is to understand how human impacts affect the fish of Molokini Crater

We are particularly interested in answering the following hypotheses: 1. Is the presence of fish affected by vessel presence

  1. Does the proportion of time fish are present within the crater negatively correlated with vessel presence?

Proposed Approach: 1. Begin by calculating the number of each species tagged and basic summary statistics 2. Calculate Metrics - Receiver Use - Pianka’s Niche Overlap - residency 3. Make the following plots - Map - Receiver locations - Map - Average receiver use by Species - Scatterplot - day night plots - Bar Plot - The number of detections per day (individual) - Bar Plot - The number of individuals detected (species) - Line Chart - The proportion of individuals detected n days after tagging (30 day moving average by species) - Bar Plot - Daily vessel traffic - Scatter Plot - vessel traffic vs. proportion of fish detected in crater daily (scatterplot by species) 4. Perform the following statistical Tests - Compare Residency Rates by Species - Compare residency by species, size, and time at liberty - Create a GLM comparing # of individuals in crater regressed against boat traffic and species using AR(1) term on dependent variable on some time scale (daily? 6 hours? depends on resolution of vessel data)

Workspace Setup

Establish Directory Heirarchy

project_directory = '/Users/stephenscherrer/Documents/Programming/Projects/Molokini'
scripts_directory = file.path(project_directory, 'Analysis Scripts')
data_directory = file.path(project_directory, 'Data')
results_directory = file.path(project_directory, 'Results')
figure_directory = file.path(results_directory, 'Figures')

Source package dependencies and utility functions from ‘Utility Functions.R’ file

source(file.path(scripts_directory, 'Utility Functions.R'))
Loading required package: suncalc
Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     
Loading required package: lubridate

Attaching package: ‘lubridate’

The following objects are masked from ‘package:base’:

    date, intersect, setdiff, union

Loading required package: readxl
Loading required package: ggplot2
RStudio Community is a great place to get help:
https://community.rstudio.com/c/tidyverse
Loading required package: data.table
data.table 1.13.0 using 1 threads (see ?getDTthreads).  Latest news: r-datatable.com
**********
This installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode.
This is a Mac. Please read https://mac.r-project.org/openmp/. Please engage with Apple and ask them for support. Check r-datatable.com for updates, and our Mac instructions here: https://github.com/Rdatatable/data.table/wiki/Installation. After several years of many reports of installation problems on Mac, it's time to gingerly point out that there have been no similar problems on Windows or Linux.
**********

Attaching package: ‘data.table’

The following objects are masked from ‘package:lubridate’:

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year

Loading required package: reshape2

Attaching package: ‘reshape2’

The following objects are masked from ‘package:data.table’:

    dcast, melt

Loading required package: ggmap
Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
Please cite ggmap if you use it! See citation("ggmap") for details.
Loading required package: dplyr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:data.table’:

    between, first, last

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Load Data

  • load various datafiles
## Vessel Traffic data
vessel_df = load_vessel_data(file.path(data_directory, "Molokini_Master_June_21.csv"))
vessel_df$date
   [1] "1346493600" "1346752800" "1346925600" "1347098400" "1347357600"
   [6] "1347530400" "1347703200" "1347962400" "1348308000" "1348567200"
  [11] "1348740000" "1348826400" "1348912800" "1349172000" "1349517600"
  [16] "1349776800" "1349949600" "1350122400" "1350986400" "1351159200"
  [21] "1351591200" "1351764000" "1351936800" "1352196000" "1352368800"
  [26] "1352541600" "1353146400" "1353405600" "1353578400" "1353751200"
  [31] "1354183200" "1354356000" "1354701600" "1354960800" "1355047200"
  [36] "1355133600" "1355220000" "1355392800" "1355565600" "1356170400"
  [41] "1356429600" "1356602400" "1357207200" "1357380000" "1357639200"
  [46] "1357812000" "1358244000" "1358848800" "1359021600" "1359194400"
  [51] "1359453600" "1359626400" "1359799200" "1360058400" "1360231200"
  [56] "1360836000" "1361440800" "1361786400" "1361872800" "1362218400"
  [61] "1362477600" "1362650400" "1362823200" "1362909600" "1363082400"
  [66] "1363255200" "1363341600" "1363428000" "1363687200" "1363860000"
  [71] "1364032800" "1364292000" "1364464800" "1364637600" "1364896800"
  [76] "1365069600" "1365242400" "1365501600" "1365933600" "1366106400"
  [81] "1366279200" "1366711200" "1366884000" "1367316000" "1367920800"
  [86] "1368093600" "1368525600" "1368871200" "1369216800" "1369303200"
  [91] "1369735200" "1369908000" "1370080800" "1370340000" "1370426400"
  [96] "1370512800" "1370685600" "1370944800" "1371117600" "1371290400"
 [101] "1373364000" "1373968800" "1374141600" "1374314400" "1374573600"
 [106] "1374746400" "1374919200" "1375264800" "1375351200" "1375524000"
 [111] "1375783200" "1376128800" "1376388000" "1376560800" "1376733600"
 [116] "1376992800" "1377165600" "1377338400" "1377597600" "1377770400"
 [121] "1378202400" "1378375200" "1378548000" "1378807200" "1378980000"
 [126] "1379152800" "1379412000" "1379584800" "1380016800" "1380189600"
 [131] "1380621600" "1380967200" "1381226400" "1381399200" "1381572000"
 [136] "1381744800" "1381831200" "1382004000" "1382176800" "1382349600"
 [141] "1382436000" "1382608800" "1382781600" "1382954400" "1383040800"
 [146] "1383213600" "1383386400" "1383559200" "1383818400" "1383904800"
 [151] "1384250400" "1384423200" "1384596000" "1384768800" "1384855200"
 [156] "1384941600" "1385028000" "1385373600" "1385632800" "1385978400"
 [161] "1386064800" "1386237600" "1386410400" "1386583200" "1387188000"
 [166] "1387274400" "1387792800" "1389261600" "1389607200" "1389693600"
 [171] "1389866400" "1390039200" "1390557600" "1390816800" "1391076000"
 [176] "1392026400" "1392112800" "1392285600" "1392631200" "1392717600"
 [181] "1392804000" "1392890400" "1393668000" "1393927200" "1394100000"
 [186] "1394704800" "1395914400" "1396087200" "1396519200" "1396692000"
 [191] "1397124000" "1397728800" "1398074400" "1399111200" "1399197600"
 [196] "1399456800" "1399543200" "1399716000" "1400148000" "1400320800"
 [201] "1400407200" "1400580000" "1400752800" "1401357600" "1401789600"
 [206] "1401876000" "1401962400" "1402135200" "1402567200" "1402740000"
 [211] "1403172000" "1403344800" "1403604000" "1403776800" "1403949600"
 [216] "1404208800" "1404381600" "1405072800" "1405159200" "1405418400"
 [221] "1405504800" "1405591200" "1405677600" "1406023200" "1406196000"
 [226] "1406800800" "1406973600" "1407232800" "1407578400" "1408183200"
 [231] "1408442400" "1408788000" "1409047200" "1409479200" "1409565600"
 [236] "1410084000" "1410170400" "1410256800" "1410343200" "1410429600"
 [241] "1410775200" "1410861600" "1410948000" "1411034400" "1411207200"
 [246] "1411466400" "1411639200" "1411984800" "1412071200" "1412244000"
 [251] "1412416800" "1412416800" "1412589600" "1412676000" "1412848800"
 [256] "1413021600" "1413194400" "1413453600" "1413799200" "1413885600"
 [261] "1414058400" "1414404000" "1414490400" "1414836000" "1415008800"
 [266] "1415095200" "1415181600" "1415268000" "1415613600" "1415700000"
 [271] "1415872800" "1416045600" "1416477600" "1416650400" "1416823200"
 [276] "1417514400" "1417860000" "1418032800" "1418119200" "1418292000"
 [281] "1418637600" "1418896800" "1419501600" "1419674400" "1419933600"
 [286] "1420106400" "1420365600" "1420452000" "1420711200" "1421056800"
 [291] "1421143200" "1421661600" "1421748000" "1421920800" "1422352800"
 [296] "1422612000" "1423476000" "1423562400" "1424080800" "1424167200"
 [301] "1424340000" "1424685600" "1424772000" "1424944800" "1426154400"
 [306] "1426500000" "1426586400" "1426759200" "1427104800" "1427191200"
 [311] "1427709600" "1427968800" "1428141600" "1428400800" "1428919200"
 [316] "1429005600" "1429178400" "1429524000" "1429783200" "1430733600"
 [321] "1430820000" "1430906400" "1430992800" "1431252000" "1432548000"
 [326] "1432634400" "1432634400" "1432720800" "1432807200" "1433066400"
 [331] "1433239200" "1433325600" "1433412000" "1433844000" "1434016800"
 [336] "1434189600" "1434362400" "1434448800" "1434794400" "1434967200"
 [341] "1435053600" "1435226400" "1435572000" "1435658400" "1435831200"
 [346] "1436176800" "1436263200" "1436349600" "1436695200" "1436781600"
 [351] "1436868000" "1437040800" "1437300000" "1437472800" "1437645600"
 [356] "1438077600" "1438077600" "1438164000" "1438250400" "1438682400"
 [361] "1440496800" "1440583200" "1440669600" "1441101600" "1441188000"
 [366] "1441274400" "1441274400" "1441447200" "1441447200" "1441533600"
 [371] "1441620000" "1441706400" "1441706400" "1441879200" "1441879200"
 [376] "1442052000" "1442138400" "1442311200" "1442397600" "1442484000"
 [381] "1442656800" "1442743200" "1442829600" "1442916000" "1443088800"
 [386] "1443520800" "1445508000" "1445767200" "1445853600" "1446112800"
 [391] "1446372000" "1446458400" "1446544800" "1446717600" "1446976800"
 [396] "1447063200" "1447322400" "1447581600" "1447668000" "1447927200"
 [401] "1448186400" "1448272800" "1448791200" "1448877600" "1448964000"
 [406] "1449396000" "1449482400" "1450000800" "1450346400" "1450778400"
 [411] "1450951200" "1451210400" "1451728800" "1451815200" "1451988000"
 [416] "1452160800" "1452333600" "1452420000" "1452506400" "1452592800"
 [421] "1452765600" "1452938400" "1453111200" "1453197600" "1453543200"
 [426] "1453629600" "1453975200" "1454234400" "1454320800" "1454407200"
 [431] "1454580000" "1455012000" "1455184800" "1455444000" "1455616800"
 [436] "1455962400" "1456048800" "1456653600" "1456826400" "1456999200"
 [441] "1457258400" "1457344800" "1457863200" "1458468000" "1458554400"
 [446] "1458813600" "1458900000" "1459072800" "1459418400" "1459591200"
 [451] "1459677600" "1460196000" "1460368800" "1460628000" "1460887200"
 [456] "1461232800" "1461319200" "1461405600" "1461492000" "1461578400"
 [461] "1461837600" "1462096800" "1462183200" "1462269600" "1462356000"
 [466] "1462528800" "1462615200" "1462874400" "1463047200" "1463652000"
 [471] "1463911200" "1463997600" "1464084000" "1464256800" "1464343200"
 [476] "1464343200" "1464429600" "1464429600" "1464516000" "1464861600"
 [481] "1464861600" "1465034400" "1465120800" "1465466400" "1465639200"
 [486] "1465639200" "1465725600" "1466071200" "1466157600" "1466330400"
 [491] "1466416800" "1466503200" "1466589600" "1466676000" "1466935200"
 [496] "1467108000" "1467626400" "1467712800" "1467885600" "1468058400"
 [501] "1468144800" "1468231200" "1468317600" "1468663200" "1468749600"
 [506] "1468836000" "1468922400" "1469095200" "1469440800" "1469527200"
 [511] "1469700000" "1469872800" "1469959200" "1470045600" "1470132000"
 [516] "1470304800" "1470650400" "1470736800" "1471255200" "1471341600"
 [521] "1471514400" "1471773600" "1471860000" "1471946400" "1472119200"
 [526] "1472983200" "1473069600" "1473156000" "1473242400" "1473328800"
 [531] "1473501600" "1473588000" "1473674400" "1473760800" "1473847200"
 [536] "1473933600" "1474538400" "1474624800" "1474711200" "1474797600"
 [541] "1474884000" "1474970400" "1475143200" "1475316000" "1475402400"
 [546] "1475488800" "1475488800" "1475575200" "1475748000" "1476007200"
 [551] "1476093600" "1476352800" "1476612000" "1476784800" "1477216800"
 [556] "1477821600" "1477994400" "1478426400" "1478512800" "1478599200"
 [561] "1478599200" "1479117600" "1479204000" "1479376800" "1479376800"
 [566] "1479636000" "1479722400" "1479808800" "1480240800" "1480327200"
 [571] "1480413600" "1480845600" "1481450400" "1481623200" "1481968800"
 [576] "1482141600" "1482228000" "1482573600" "1482660000" "1482832800"
 [581] "1483437600" "1483524000" "1483610400" "1483869600" "1483956000"
 [586] "1484042400" "1484388000" "1484474400" "1484560800" "1484647200"
 [591] "1484820000" "1485165600" "1485252000" "1485511200" "1485597600"
 [596] "1485684000" "1486029600" "1486202400" "1486288800" "1486461600"
 [601] "1486634400" "1486980000" "1487066400" "1487412000" "1487671200"
 [606] "1487844000" "1488016800" "1488103200" "1488189600" "1488276000"
 [611] "1488448800" "1488535200" "1488967200" "1489053600" "1489226400"
 [616] "1489312800" "1489399200" "1489485600" "1489572000" "1489658400"
 [621] "1489744800" "1490090400" "1490176800" "1490263200" "1490436000"
 [626] "1490436000" "1490522400" "1490608800" "1490695200" "1490781600"
 [631] "1490868000" "1490954400" "1491040800" "1491127200" "1491213600"
 [636] "1491472800" "1491645600" "1491732000" "1491818400" "1491904800"
 [641] "1491991200" "1492077600" "1492250400" "1492423200" "1492509600"
 [646] "1492682400" "1492855200" "1495188000" "1495274400" "1495360800"
 [651] "1495447200" "1495533600" "1495620000" "1495706400" "1495879200"
 [656] "1495965600" "1496052000" "1496311200" "1496484000" "1496570400"
 [661] "1496656800" "1496743200" "1496829600" "1497002400" "1497088800"
 [666] "1497175200" "1497261600" "1497348000" "1497434400" "1497520800"
 [671] "1497607200" "1497693600" "1497780000" "1497866400" "1497952800"
 [676] "1498039200" "1498125600" "1498212000" "1498298400" "1498384800"
 [681] "1498471200" "1498557600" "1498730400" "1498903200" "1498989600"
 [686] "1499076000" "1499162400" "1499248800" "1499508000" "1499594400"
 [691] "1499680800" "1499767200" "1500112800" "1500199200" "1500285600"
 [696] "1500372000" "1500458400" "1500717600" "1500804000" "1500890400"
 [701] "1500976800" "1501063200" "1501668000" "1501927200" "1502100000"
 [706] "1502186400" "1502272800" "1502359200" "1502532000" "1502618400"
 [711] "1502704800" "1502791200" "1502877600" "1503136800" "1503223200"
 [716] "1503309600" "1503396000" "1503482400" "1503568800" "1503655200"
 [721] "1503914400" "1504000800" "1504087200" "1504260000" "1504346400"
 [726] "1504432800" "1504519200" "1504605600" "1504692000" "1504778400"
 [731] "1504951200" "1505037600" "1505124000" "1505210400" "1505296800"
 [736] "1505383200" "1505556000" "1505642400" "1505728800" "1505815200"
 [741] "1505901600" "1505988000" "1506160800" "1506420000" "1506506400"
 [746] "1506765600" "1506852000" "1506938400" "1507024800" "1507197600"
 [751] "1507370400" "1507456800" "1507543200" "1507629600" "1507716000"
 [756] "1507975200" "1508148000" "1508234400" "1508580000" "1508666400"
 [761] "1508752800" "1509184800" "1509271200" "1509357600" "1509444000"
 [766] "1509876000" "1510135200" "1510221600" "1510394400" "1510480800"
 [771] "1510567200" "1510999200" "1511258400" "1511690400" "1511863200"
 [776] "1512554400" "1512640800" "1512813600" "1512900000" "1512986400"
 [781] "1513072800" "1513504800" "1513591200" "1513677600" "1513764000"
 [786] "1514196000" "1514455200" "1514628000" "1514714400" "1514800800"
 [791] "1514887200"

Clean Data

  • Associate detections with time of day (day, night, dawn, dusk)
  • Remove detections from tags not associated with this study
  • Remove false detections
## Associate detections with time of day
molo_df = get_time_of_day(molo_df)

## Combine vue df with tagging df - remove irrelevant tags in the process
molo_df = inner_join(x = molo_df, y = tagging_df[ ,c('tag_id', 'species', 'fork_length', 'tagging_date' )], by = 'tag_id')

## Filter false detections
# molo_df = filter_false_detections(molo_df)

Exploratory Data Analysis

Count of individuals tagged by species

## Get count of individuals tagged by species
tags_by_species = aggregate(tag_id ~ species, data = tagging_df, FUN = uniqueN)
  colnames(tags_by_species) = c('species', 'tagged')
  
## Merge with count of individuals detected by species
tags_by_species = left_join(tags_by_species, aggregate(tag_id ~ species, data = molo_df, FUN = uniqueN), by = 'species')

## Replace NA values with 0
tags_by_species[is.na(tags_by_species)] = 0

print(tags_by_species)

Summary Statistics

# Time at liberty
time_at_liberty = calculate_time_at_liberty(molo_df)

# Days Detected
days_detected = calculate_days_detected(molo_df)

# % of days detected
detection_stats = merge(x = days_detected, y = time_at_liberty[ ,c('tag_id', 'days_at_liberty')], on.x = 'tag_id', on.y = 'tag_id')
detection_stats$percent_days_detected = round(detection_stats$unique_days / detection_stats$days_at_liberty, 4) * 100

# Merge with tagging data to get fish info
detection_stats = merge(x = tagging_df[ ,c('tagging_date', 'species', 'tag_id', 'fork_length')], y = detection_stats, on.x = 'tag_id', on.y = 'tag_id')
detection_stats = detection_stats[order(detection_stats$species, detection_stats$tagging_date, detection_stats$tag_id), ]
print(detection_stats)

Metric Calculations

index of receiver use

## sum all spp, sum all individuals (detections of tag at given reciever / all detections of tag)

## Calculate unique detections per tag per receiver station
detections_per_tag_per_receiver = aggregate(datetime~tag_id+receiver+species, data = molo_df, FUN = uniqueN)
colnames(detections_per_tag_per_receiver) = c('tag_id', 'receiver', 'species', 'detections')

## Calculate receiver use metric for each fish and receiver pair
detections_per_tag_per_receiver$receiver_use = 0
for (species in detections_per_tag_per_receiver$species){
  for (i in 1:nrow(detections_per_tag_per_receiver)){
    detections_per_tag_per_receiver$receiver_use[i] = detections_per_tag_per_receiver$detections[i] / sum(detections_per_tag_per_receiver$detections[detections_per_tag_per_receiver$tag_id == detections_per_tag_per_receiver$tag_id[i]])
  }
}

## Calculate average receiver use metric for each tag - Omit stations with no use as this would bias metric
indvidual_receiver_use = aggregate(receiver_use~tag_id+species, data = detections_per_tag_per_receiver[detections_per_tag_per_receiver$receiver_use > 0, ], FUN = mean)

## Add this information to detection_stats
detection_stats = merge(detection_stats, indvidual_receiver_use, on = 'tag_id')

## Calculate receiver use metric by species
species_receiver_use = aggregate(receiver_use~species, data = indvidual_receiver_use, FUN = mean)
colnames(species_receiver_use) = c('species', 'receiver_use')

print(species_receiver_use)

Calculate Pianka’s Niche Overlap Index - Pianka (1973) The Structure of Lizard Communities

0 = no overlap, 1 = perfect overlap

## Aggregate data averaged by species
receiver_use_aggregated_by_species = aggregate(receiver_use ~ species + receiver , data = detections_per_tag_per_receiver, FUN = mean)
  colnames(receiver_use_aggregated_by_species) = c('species', 'receiver', 'avg_use_index')
  
## Reshape from Long to Wide format
receiver_use_aggregated_by_species_wide = dcast(receiver_use_aggregated_by_species, species ~ receiver)
Using avg_use_index as value column: use value.var to override.
## Get all species combinations 
species_combos = data.frame()
for (i in 1:nrow(receiver_use_aggregated_by_species_wide)){
    if(i != nrow(receiver_use_aggregated_by_species_wide)){
    for (j in (i+1):nrow(receiver_use_aggregated_by_species_wide)){
      species_combos = rbind(species_combos, data.frame('species_1' = receiver_use_aggregated_by_species_wide$species[i], 'species_2' = receiver_use_aggregated_by_species_wide$species[j]))
    }
  }
}

## Change any NA values to zero
receiver_use_aggregated_by_species_wide[is.na(receiver_use_aggregated_by_species_wide)] = 0

## Calculate Pianka's index for all pairs
species_combos$pianka_index = 0
for(i in 1:nrow(species_combos)){
  species_combos$pianka_index[i] = sum(receiver_use_aggregated_by_species_wide[receiver_use_aggregated_by_species_wide$species == species_combos$species_1[i], -1] * 
     receiver_use_aggregated_by_species_wide[receiver_use_aggregated_by_species_wide$species == species_combos$species_2[i], -1]) /
    (sqrt(sum(receiver_use_aggregated_by_species_wide[receiver_use_aggregated_by_species_wide$species == species_combos$species_1[i], -1] ^ 2) * 
    sum(receiver_use_aggregated_by_species_wide[receiver_use_aggregated_by_species_wide$species == species_combos$species_2[i], -1] ^ 2)))
}

## Round to 3 digits
species_combos$pianka_index = round(species_combos$pianka_index, 3)

print(species_combos)

Plots

Study Area

## Plot study area and receivers
molo_basemap = get_map(location = c(lon = -156.496331, lat = 20.633007), zoom = 16, maptype = 'satellite')
Source : https://maps.googleapis.com/maps/api/staticmap?center=20.633007,-156.496331&zoom=16&size=640x640&scale=2&maptype=satellite&language=en-EN&key=xxx-hggZe5I57UhGHb8
receiver_map = ggmap(molo_basemap) + geom_point(data = molo_df, mapping = aes(x = lon, y = lat), col = 'red') + labs(x = '°Longitude', y = '°Latitude') + ggsave(filename = 'Receiver Locations Google Map.pdf', path = figure_directory)
Saving 7 x 7 in image
print(receiver_map)

Species Use Plots

## Get average use of receiver by species 
species_receiver_use = aggregate(receiver_use~species+receiver, data = detections_per_tag_per_receiver, FUN = mean)
  colnames(species_receiver_use) = c('species', 'receiver' , 'receiver_use')
  
## Merge with lat lon positions for each receiver from molo_df
receiver_postions =  unique(molo_df[ ,c('receiver', 'lat', 'lon')])
species_receiver_use = merge(x = species_receiver_use, y = receiver_postions, on = 'receiver', all.x = T, all.y = F)

## Make species plots for receiver use
for(species in species_receiver_use$species){
  receiver_use_by_spp = ggmap(molo_basemap) + 
    geom_point(data = species_receiver_use[species_receiver_use$species == species, ], 
               mapping = aes(x = lon, y = lat, color = 'red', size =  receiver_use)) + 
    labs(x = '°Longitude', y = '°Latitude') +
    ggsave(filename = paste('Receiver Use by ', species, '.pdf', sep = ''), path = figure_directory)
  print(receiver_use_by_spp)
}
Saving 7 x 7 in image

Day Night Plots

## By Individual
for (tag_id in unique(molo_df$tag_id)){
  print(tag_id)
    pdf(file = file.path(figure_directory, paste('Day Night Plot - Tag ID ', tag_id, '.pdf', sep = '')))
  plot_day_night(molo_df[molo_df$tag_id == tag_id, ], plot_title = paste(tagging_df$species[tagging_df$tag_id == tag_id], '- Tag', as.character(tag_id), sep = ' '))
  dev.off()
}
[1] "47513"
[1] "30711"
[1] "30754"
[1] "51591"
[1] "51590"
[1] "51593"
[1] "39194"
[1] "30755"
[1] "51594"

Barplot of detections by date

   ## Make and save plot
    ggplot(data = all_detections_long, mapping = aes(x = date, y = detections)) +
    geom_bar(stat = "identity") + 
    labs(title = 'All Tagged Individuals', x = 'Date', y = 'Detections') + 
    ggsave(filename = paste('Daily Detection Barplot - all tags.pdf'), path = figure_directory)
Saving 7 x 7 in image

Bar plot # of Fish (standardized percent of fish tagged to date) by date and Spp

THIS NEEDS WORK!!!

Bar plot vessel traffic by date

In the future, might also consider max vessels present at a given time

## Calculate Daily Vessel Stats
vessels_per_day = aggregate(vessel_name ~ Date(date), data = vessel_df, FUN = uniqueN)
Error in rep.int(NA_real_, length) : invalid 'times' value

Scatter plot x axis boat traffic, y axis presence / absence color by spp

Scatter plot x axis boat traffic, y axis detections per individual color by spp add error bars for daily detections

Residency and dispersal

## Calculate residency
detection_stats$residence_metric = detection_stats$unique_days / detection_stats$days_at_liberty

## Assign residence category: low = < 33%, medium = 33 - 66, high = >= 66 (Tinhan et al. 2014) -
detection_stats$residence_category = 'Low'
for (i in 1:nrow(detection_stats)){
  if (detection_stats$residence_metric[i] >= (1/3)) {
    detection_stats$residence_category[i] = 'Medium'
  }
  if (detection_stats$residence_metric[i] >= (2/3)) {
    detection_stats$residence_category[i] = 'High'
  }
}

## Create grouped barplot of residency by species
residence_counts_by_species = aggregate(tag_id ~ species + residence_category, data = detection_stats, FUN = length)

ggplot(data = residence_counts_by_species, mapping = aes(x=species, y=tag_id, fill=residence_category)) +
  geom_bar(stat="identity", position = "dodge")

Takeaways - All 4 omilus were highly resident as were grey reef sharks. No other species have replicates so…?

Calculate 30 day moving average of residency, then plot against days since tagging

Statistical analysis

Calculate mean residency by spp (irregardless of time), then ANOVA by spp Use Tukey’s HSD to determine significance

## ANOVA model for residency metric by species
residence_by_species_anova = aov(residence_metric ~ species, data=detection_stats)
summary(residence_by_species_anova)
            Df Sum Sq Mean Sq F value  Pr(>F)   
species      4 0.6781 0.16954   36.28 0.00212 **
Residuals    4 0.0187 0.00467                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## Tukey's Honestly Significant Differences between species
TukeyHSD(residence_by_species_anova)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = residence_metric ~ species, data = detection_stats)

$species
                                           diff        lwr        upr
omilu-grey reef shark                0.13134256 -0.1318229  0.3945080
sandbar shark-grey reef shark       -0.78499266 -1.1571648 -0.4128205
ulua-grey reef shark                -0.04021073 -0.4123829  0.3319614
whitetip reef shark-grey reef shark  0.01604564 -0.3561265  0.3882178
sandbar shark-omilu                 -0.91633521 -1.2560803 -0.5765901
ulua-omilu                          -0.17155329 -0.5112984  0.1681918
whitetip reef shark-omilu           -0.11529692 -0.4550421  0.2244482
ulua-sandbar shark                   0.74478192  0.3150345  1.1745293
whitetip reef shark-sandbar shark    0.80103830  0.3712909  1.2307857
whitetip reef shark-ulua             0.05625637 -0.3734910  0.4860037
                                        p adj
omilu-grey reef shark               0.3282773
sandbar shark-grey reef shark       0.0034142
ulua-grey reef shark                0.9852738
whitetip reef shark-grey reef shark 0.9995567
sandbar shark-omilu                 0.0013197
ulua-omilu                          0.3205721
whitetip reef shark-omilu           0.6064486
ulua-sandbar shark                  0.0071653
whitetip reef shark-sandbar shark   0.0054544
whitetip reef shark-ulua            0.9710009

GLM comparing residency time by spp independent var (time at liberty) dependent (residency index)

summary(species_glm)

Call:
glm(formula = residence_metric ~ species, family = binomial(logit), 
    data = detection_stats)

Deviance Residuals: 
       1         2         3         4         5         6         7  
 0.00007   0.00007   0.00000  -0.26156   0.32896  -0.00022   0.00007  
       8         9  
 0.00000   0.00000  

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)
(Intercept)                  1.8652     2.0751   0.899    0.369
speciesomilu                 4.0251     9.7553   0.413    0.680
speciessandbar shark        -4.2953     4.2135  -1.019    0.308
speciesulua                 -0.3098     3.3547  -0.092    0.926
specieswhitetip reef shark   0.1458     3.7297   0.039    0.969

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4.31486  on 8  degrees of freedom
Residual deviance: 0.17663  on 4  degrees of freedom
AIC: 11.401

Number of Fisher Scoring iterations: 8

No Significant differences found

GLM comparing time in crater to vessel traffic

---
title: "Molokini Analysis"
output: html_notebook
---

# Written by Steve Scherrer - July/August 2021

## Background
This notebook documents preliminary analysis of tracking data for fish tagged in Molokini Crater between 2020-05-16 and 2021-05-24. 

The purpose of this study is to understand how human impacts affect the fish of Molokini Crater

We are particularly interested in answering the following hypotheses:
1. Is the presence of fish affected by vessel presence

2. Does the proportion of time fish are present within the crater negatively correlated with vessel presence?

Proposed Approach:
1. Begin by calculating the number of each species tagged and basic summary statistics
2. Calculate Metrics
  - Receiver Use
  - Pianka's Niche Overlap
  - residency
3. Make the following plots
  - Map - Receiver locations
  - Map - Average receiver use by Species
  - Scatterplot - day night plots
  - Bar Plot - The number of detections per day (individual)
  - Bar Plot - The number of individuals detected (species)
  - Line Chart - The proportion of individuals detected n days after tagging (30 day moving average by species)
  - Bar Plot - Daily vessel traffic
  - Scatter Plot - vessel traffic vs. proportion of fish detected in crater daily (scatterplot by species)
4. Perform the following statistical Tests
  - Compare Residency Rates by Species
  - Compare residency by species, size, and time at liberty
  - Create a GLM comparing # of individuals in crater regressed against boat traffic and species using AR(1) term on dependent variable on some time scale (daily? 6 hours? depends on resolution of vessel data)

# Workspace Setup
## Establish Directory Heirarchy
```{r}
project_directory = '/Users/stephenscherrer/Documents/Programming/Projects/Molokini'
scripts_directory = file.path(project_directory, 'Analysis Scripts')
data_directory = file.path(project_directory, 'Data')
results_directory = file.path(project_directory, 'Results')
figure_directory = file.path(results_directory, 'Figures')
```

## Source package dependencies and utility functions from 'Utility Functions.R' file
```{r}
source(file.path(scripts_directory, 'Utility Functions.R'))
```

## Load Data
- load various datafiles 
```{r}
## Files from VUE 
molo_df = load_vemco_data(file.path(data_directory, 'VUE_Export.csv'))
false_detections_df = load_fdf_report(file.path(data_directory, 'FDA.csv'))

## Vessel Traffic data
vessel_df = load_vessel_data(file.path(data_directory, "Molokini_Master_June_21.csv"))

## Metadata Files
tagging_df = load_tagging_data(file.path(data_directory, 'Molokini_Fish_Tagging_master.xlsx'))

# receiver_df = load_receiver_data(file.path(data_directory, ))
```

## Clean Data
- Associate detections with time of day (day, night, dawn, dusk)
- Remove detections from tags not associated with this study
- Remove false detections
```{r}
## Associate detections with time of day
molo_df = get_time_of_day(molo_df)

## Combine vue df with tagging df - remove irrelevant tags in the process
molo_df = inner_join(x = molo_df, y = tagging_df[ ,c('tag_id', 'species', 'fork_length', 'tagging_date' )], by = 'tag_id')

## Filter false detections
# molo_df = filter_false_detections(molo_df)
```

# Exploratory Data Analysis
## Count of individuals tagged by species
```{r}
## Get count of individuals tagged by species
tags_by_species = aggregate(tag_id ~ species, data = tagging_df, FUN = uniqueN)
  colnames(tags_by_species) = c('species', 'tagged')
  
## Merge with count of individuals detected by species
tags_by_species = left_join(tags_by_species, aggregate(tag_id ~ species, data = molo_df, FUN = uniqueN), by = 'species')

## Replace NA values with 0
tags_by_species[is.na(tags_by_species)] = 0

print(tags_by_species)
```

## Summary Statistics
```{r}
# Time at liberty
time_at_liberty = calculate_time_at_liberty(molo_df)

# Days Detected
days_detected = calculate_days_detected(molo_df)

# % of days detected
detection_stats = merge(x = days_detected, y = time_at_liberty[ ,c('tag_id', 'days_at_liberty')], on.x = 'tag_id', on.y = 'tag_id')
detection_stats$percent_days_detected = round(detection_stats$unique_days / detection_stats$days_at_liberty, 4) * 100

# Merge with tagging data to get fish info
detection_stats = merge(x = tagging_df[ ,c('tagging_date', 'species', 'tag_id', 'fork_length')], y = detection_stats, on.x = 'tag_id', on.y = 'tag_id')
detection_stats = detection_stats[order(detection_stats$species, detection_stats$tagging_date, detection_stats$tag_id), ]
print(detection_stats)
```

# Metric Calculations
## index of receiver use
```{r}
## sum all spp, sum all individuals (detections of tag at given reciever / all detections of tag)

## Calculate unique detections per tag per receiver station
detections_per_tag_per_receiver = aggregate(datetime~tag_id+receiver+species, data = molo_df, FUN = uniqueN)
colnames(detections_per_tag_per_receiver) = c('tag_id', 'receiver', 'species', 'detections')

## Calculate receiver use metric for each fish and receiver pair
detections_per_tag_per_receiver$receiver_use = 0
for (species in detections_per_tag_per_receiver$species){
  for (i in 1:nrow(detections_per_tag_per_receiver)){
    detections_per_tag_per_receiver$receiver_use[i] = detections_per_tag_per_receiver$detections[i] / sum(detections_per_tag_per_receiver$detections[detections_per_tag_per_receiver$tag_id == detections_per_tag_per_receiver$tag_id[i]])
  }
}

## Calculate average receiver use metric for each tag - Omit stations with no use as this would bias metric
indvidual_receiver_use = aggregate(receiver_use~tag_id+species, data = detections_per_tag_per_receiver[detections_per_tag_per_receiver$receiver_use > 0, ], FUN = mean)

## Add this information to detection_stats
detection_stats = merge(detection_stats, indvidual_receiver_use, on = 'tag_id')

## Calculate receiver use metric by species
species_receiver_use = aggregate(receiver_use~species, data = indvidual_receiver_use, FUN = mean)
colnames(species_receiver_use) = c('species', 'receiver_use')

print(species_receiver_use)
``` 
## Calculate Pianka's Niche Overlap Index - Pianka (1973) The Structure of Lizard Communities
 0 = no overlap, 1 = perfect overlap
```{r}
## Aggregate data averaged by species
receiver_use_aggregated_by_species = aggregate(receiver_use ~ species + receiver , data = detections_per_tag_per_receiver, FUN = mean)
  colnames(receiver_use_aggregated_by_species) = c('species', 'receiver', 'avg_use_index')
  
## Reshape from Long to Wide format
receiver_use_aggregated_by_species_wide = dcast(receiver_use_aggregated_by_species, species ~ receiver)

## Get all species combinations 
species_combos = data.frame()
for (i in 1:nrow(receiver_use_aggregated_by_species_wide)){
    if(i != nrow(receiver_use_aggregated_by_species_wide)){
    for (j in (i+1):nrow(receiver_use_aggregated_by_species_wide)){
      species_combos = rbind(species_combos, data.frame('species_1' = receiver_use_aggregated_by_species_wide$species[i], 'species_2' = receiver_use_aggregated_by_species_wide$species[j]))
    }
  }
}

## Change any NA values to zero
receiver_use_aggregated_by_species_wide[is.na(receiver_use_aggregated_by_species_wide)] = 0

## Calculate Pianka's index for all pairs
species_combos$pianka_index = 0
for(i in 1:nrow(species_combos)){
  species_combos$pianka_index[i] = sum(receiver_use_aggregated_by_species_wide[receiver_use_aggregated_by_species_wide$species == species_combos$species_1[i], -1] * 
     receiver_use_aggregated_by_species_wide[receiver_use_aggregated_by_species_wide$species == species_combos$species_2[i], -1]) /
    (sqrt(sum(receiver_use_aggregated_by_species_wide[receiver_use_aggregated_by_species_wide$species == species_combos$species_1[i], -1] ^ 2) * 
    sum(receiver_use_aggregated_by_species_wide[receiver_use_aggregated_by_species_wide$species == species_combos$species_2[i], -1] ^ 2)))
}

## Round to 3 digits
species_combos$pianka_index = round(species_combos$pianka_index, 3)

print(species_combos)
```

## Plots
Study Area
```{r}
## Plot study area and receivers
molo_basemap = get_map(location = c(lon = -156.496331, lat = 20.633007), zoom = 16, maptype = 'satellite')
receiver_map = ggmap(molo_basemap) + geom_point(data = molo_df, mapping = aes(x = lon, y = lat), col = 'red') + labs(x = '°Longitude', y = '°Latitude') + ggsave(filename = 'Receiver Locations Google Map.pdf', path = figure_directory)
print(receiver_map)
```

### Species Use Plots
```{r}
## Get average use of receiver by species 
species_receiver_use = aggregate(receiver_use~species+receiver, data = detections_per_tag_per_receiver, FUN = mean)
  colnames(species_receiver_use) = c('species', 'receiver' , 'receiver_use')
  
## Merge with lat lon positions for each receiver from molo_df
receiver_postions =  unique(molo_df[ ,c('receiver', 'lat', 'lon')])
species_receiver_use = merge(x = species_receiver_use, y = receiver_postions, on = 'receiver', all.x = T, all.y = F)

## Make species plots for receiver use
for(species in species_receiver_use$species){
  receiver_use_by_spp = ggmap(molo_basemap) + 
    geom_point(data = species_receiver_use[species_receiver_use$species == species, ], 
               mapping = aes(x = lon, y = lat, color = 'red', size =  receiver_use)) + 
    labs(x = '°Longitude', y = '°Latitude') +
    ggsave(filename = paste('Receiver Use by ', species, '.pdf', sep = ''), path = figure_directory)
  print(receiver_use_by_spp)
}
```

### Day Night Plots
```{r}
### Day Night Plots
## For all fish
pdf(file = file.path(figure_directory, 'Day Night Plot - All Fish.pdf'))
plot_day_night(molo_df, plot_title = 'All Fish')
dev.off()

## By Species
for (spp in unique(molo_df$species)){
  pdf(file = file.path(figure_directory, paste('Day Night Plot - Species ', spp, '.pdf', sep = '')))
  plot_day_night(molo_df[molo_df$tag_id == molo_df$tag_id[molo_df$species == spp], ], plot_title = spp)
  dev.off()
}

## By Individual
for (tag_id in unique(molo_df$tag_id)){
  print(tag_id)
    pdf(file = file.path(figure_directory, paste('Day Night Plot - Tag ID ', tag_id, '.pdf', sep = '')))
  plot_day_night(molo_df[molo_df$tag_id == tag_id, ], plot_title = paste(tagging_df$species[tagging_df$tag_id == tag_id], '- Tag', as.character(tag_id), sep = ' '))
  dev.off()
}
```


### Barplot of detections by date
```{r}
### Bar plot of detections in crater by date 
detections_per_day_df = count_detections_per_date(molo_df)

## Barplot of detections by individual
for(i in 1:nrow(detections_per_day_df)){
  ## Convert from wide to long format
  indv_data = melt(detections_per_day_df[i, ])
  colnames(indv_data) = c('date', 'detections')
  
  ## Make and save plot
  ggplot(data = indv_data, mapping = aes(x = date, y = detections)) +
    geom_bar(stat = "identity") + 
    labs(title = paste('Tag ', rownames(detections_per_day_df)[i], sep = ' '), x = 'Date', y = 'Detections') + 
    ggsave(filename = paste('Daily Detection Barplot -', rownames(detections_per_day_df)[i], '.pdf'), path = figure_directory)
}

## Detections by species
detections_per_day_spp_stg = detections_per_day_df
detections_per_day_spp_stg$tag_id = rownames(detections_per_day_spp_stg)
detections_per_day_spp_stg = left_join(x = detections_per_day_spp_stg, tagging_df[ ,c('tag_id', 'species')], by = 'tag_id')

## Loop through species
for (spp in unique(detections_per_day_spp_stg$species)){
  
  ## Subset individual df by species
  spp_subset_df = detections_per_day_spp_stg[detections_per_day_spp_stg$species == spp, -which(colnames(detections_per_day_spp_stg) %in% c('tag_id', 'species'))]
  
  ## Convert to long format
  detections_per_spp = melt(colSums(spp_subset_df), value.name =   'detections')
  detections_per_spp$date = rownames(detections_per_spp)
  
  ## Make and save plot
    ggplot(data = detections_per_spp, mapping = aes(x = date, y = detections)) +
    geom_bar(stat = "identity") + 
    labs(title = spp, x = 'Date', y = 'Detections') + 
    ggsave(filename = paste('Daily Detection Barplot -', spp, '.pdf'), path = figure_directory)
}

## Barplot of all detections
all_detections = colSums(detections_per_day_df)
## Convert to long format
all_detections_long = melt(all_detections, value.name =   'detections')
all_detections_long$date = rownames(all_detections_long)
  
## Make and save plot
ggplot(data = all_detections_long, mapping = aes(x = date, y = detections)) +
geom_bar(stat = "identity") + 
labs(title = 'All Tagged Individuals', x = 'Date', y = 'Detections') + 
ggsave(filename = paste('Daily Detection Barplot - all tags.pdf'), path = figure_directory)
```

### Bar plot # of Fish (standardized percent of fish tagged to date) by date and Spp


THIS NEEDS WORK!!!
```{r}
## Convert detections_per_day to presence/absence
presence_absence_wide_df = detections_per_day_df
presence_absence_wide_df[presence_absence_wide_df > 0] = 1

## Convert from wide to long format
presence_absence_long_df = melt(presence_absence_wide_df, id.vars = c('date'), measure.vars = colnames(presence_absence_wide_df)[2:ncol(presence_absence_wide_df)], variable.name = 'tag_id', value.name = 'detected')

# Drop 'tag_' prefix from tag_id column for matching purposes
presence_absence_long_df$tag_id = levels(presence_absence_long_df$tag_id)[presence_absence_long_df$tag_id]
for(i in 1:nrow(presence_absence_long_df)){
  presence_absence_long_df$tag_id[i] = strsplit(presence_absence_long_df$tag_id[i], split = '_')[[1]][2]
}

## Merge with species from tagging data
presence_absence_long_df = merge(x = presence_absence_long_df, y = tagging_df[ ,c('tag_id', 'species')], on = 'tag_id')

## Drop date and tag pairs preceding the date the fish was tagged
indicies_to_drop = c()
for(i in nrow(presence_absence_long_df)){
  if(as.Date(tagging_df$datetime[tagging_df$tag_id == presence_absence_long_df$tag_id[i]]) <= presence_absence_long_df$date[i]){
    indicies_to_drop = c(indicies_to_drop, i)
  }
}
presence_absence_long_df = presence_absence_long_df[-indicies_to_drop, ]

## Get a list of active tags by date and species
active_tags_by_date = aggregate(tag_id ~ date + species, data = presence_absence_long_df, FUN = uniqueN)
  colnames(active_tags_by_date) = c('date', 'species', 'deployed_tags')

## Standardize tag counts by tags deployed and plot as % of tags detected per day by species
for(species in unique(presence_absence_long_df$species)){
  
  # Count number of tags detected daily by species 
  presence_absence_by_spp_df = aggregate(detected~date, data = presence_absence_long_df[presence_absence_long_df$species == species, ], FUN = sum)
  colnames(presence_absence_by_spp_df) = c('date', 'tags_detected')
  
  # Standardize daily tag count by the number of tags deployed
  presence_absence_by_spp_df = merge(x = presence_absence_by_spp_df, y = active_tags_by_date[active_tags_by_date$species == species, ], on = 'date')
  presence_absence_by_spp_df$percent_tags_detected = presence_absence_by_spp_df$detected / presence_absence_by_spp_df$deployed_tags
  
  # Make plot at species level
  ggplot(data = presence_absence_by_spp_df, mapping = aes(x = date, y = percent_tags_detected)) + 
    geom_bar(stat = 'identity') + 
    labs(title = species, x = 'Date', y = '% of tags detected') +
    ggsave(filename = paste('Detections Standardized By Species - ', species, '.pdf', sep = ''), path = figure_directory)
}

```

### Bar plot vessel traffic by date
In the future, might also consider max vessels present at a given time
```{r}

## Calculate Daily Vessel Stats
vessels_per_day = aggregate(vessel_name ~ Date(date), data = vessel_df, FUN = uniqueN)
colnames(vessels_per_day) = c('date', 'total_vessels')



# Make plot for total 
total_vessels_plot = ggplot(data = vessels_per_day, mapping = aes(x = date, y = total_vessels)) + 
    geom_bar(stat = 'identity') + 
    labs(title = 'Maximum Number of Co-occuring Vessels Daily', x = 'Date', y = '# of Vessels') +
    ggsave(filename = paste('Total Vessels Daily.pdf ', species, '.pdf', sep = ''), path = figure_directory)

print(max_vessels_plot)
print(total_vessels_plot)
```


### Scatter plot x axis boat traffic, y axis presence / absence color by spp
```{r}

```

### Scatter plot x axis boat traffic, y axis detections per individual color by spp add error bars for daily detections
```{r}

```

# Residency and dispersal
```{r}
## Calculate residency
detection_stats$residence_metric = detection_stats$unique_days / detection_stats$days_at_liberty

## Assign residence category: low = < 33%, medium = 33 - 66, high = >= 66 (Tinhan et al. 2014) -
detection_stats$residence_category = 'Low'
for (i in 1:nrow(detection_stats)){
  if (detection_stats$residence_metric[i] >= (1/3)) {
    detection_stats$residence_category[i] = 'Medium'
  }
  if (detection_stats$residence_metric[i] >= (2/3)) {
    detection_stats$residence_category[i] = 'High'
  }
}

## Create grouped barplot of residency by species
residence_counts_by_species = aggregate(tag_id ~ species + residence_category, data = detection_stats, FUN = length)

ggplot(data = residence_counts_by_species, mapping = aes(x=species, y=tag_id, fill=residence_category)) +
  geom_bar(stat="identity", position = "dodge")
```

Takeaways - All 4 omilus were highly resident as were grey reef sharks. No other species have replicates so...?

## Calculate 30 day moving average of residency, then plot against days since tagging
```{r}
## Get total days in the study
total_days_in_study = as.numeric(diff.Date(c(min(molo_df$date), max(molo_df$date))))

## Create a dataframe where rows are tag id and columns are study date
present_after_n_days_df = data.frame()

## Determine if a tag was detected on a receiver n days after tagging
for (i in 1:uniqueN(molo_df$tag_id)){
  ## Subset data for individual tags
  indv_data = molo_df[molo_df$tag_id == unique(molo_df$tag_id)[i], ]
  ## Determine if a fish was present n days after tagging
  difftimes = rep(0, len = total_days_in_study)
  # determine difference in days between each unique day a tag was detected and the tag's earliest detection, flip the corresponding value in difftimes array to 1
  detected_dates = unique(indv_data$date)
  for (j in 1:length(detected_dates)){
    difftimes[as.numeric(diff.Date(c(min(indv_data$date), detected_dates[j]))) + 1] = 1
  }
  df_row = c(unique(molo_df$tag_id)[i], difftimes)
  present_after_n_days_df = rbind(present_after_n_days_df, df_row)
}
colnames(present_after_n_days_df) = c('tag_id', as.character(1:total_days_in_study))

## Convert from wide format to long format
present_after_n_days_df_long_df = melt(present_after_n_days_df, id.vars = 'tag_id', measure.vars = colnames(present_after_n_days_df)[2:ncol(present_after_n_days_df)], variable.name = 'day', value.name = 'detected')

## Merge with species data
present_after_n_days_df_long_df = left_join(x = present_after_n_days_df_long_df, y = tagging_df[ ,c('tag_id', 'species')], by = 'tag_id')
# Recast to numeric because of the join function
present_after_n_days_df_long_df$detected = as.numeric(present_after_n_days_df_long_df$detected)

## Calculate number of each species present n days after tagging
species_presence_after_tagging = aggregate(detected ~ species + day, data = present_after_n_days_df_long_df, FUN = sum)
  colnames(species_presence_after_tagging) = c('species', 'day', 'n_detected')

## Count unique tags by species
individuals_per_species = aggregate(tag_id ~ species, data = present_after_n_days_df_long_df, FUN = uniqueN)
colnames(individuals_per_species) = c('species', 'n_tagged')

## Standardize species level daily counts by number of tags belonging to that species
species_presence_after_tagging = left_join(x = species_presence_after_tagging, y = individuals_per_species, by = 'species')
species_presence_after_tagging$percent_individuals_detected = species_presence_after_tagging$n_detected / species_presence_after_tagging$n_tagged

## Convert day from factor to numeric
species_presence_after_tagging$day = as.numeric(levels(species_presence_after_tagging$day)[species_presence_after_tagging$day])

## remove any NA days
species_presence_after_tagging = species_presence_after_tagging[!is.na(species_presence_after_tagging$day), ]

## Calculate 30 day moving average
spp_presence_30_day_avg = data.frame()
for (species in unique(species_presence_after_tagging$species)){
  spp_presence_after_tagging = species_presence_after_tagging[species_presence_after_tagging$species == species, ]
  moving_average_30 = c()
  for (i in 30:max(spp_presence_after_tagging$day)){
    moving_average_30 = c(moving_average_30, mean(spp_presence_after_tagging$percent_individuals_detected[spp_presence_after_tagging$day >= i-30 & spp_presence_after_tagging$day <= i]))
  }
  df_row = c(species, moving_average_30)
  spp_presence_30_day_avg = rbind(spp_presence_30_day_avg, df_row)
}
colnames(spp_presence_30_day_avg) = c('species', as.character(1:(ncol(spp_presence_30_day_avg)-1)))

## Convert from wide format to long format
spp_presence_30_day_avg_long_df = melt(spp_presence_30_day_avg, id.vars = 'species', measure.vars = colnames(spp_presence_30_day_avg)[2:ncol(spp_presence_30_day_avg)], variable.name = 'day', value.name = 'percent_individuals_detected')

# Convert percent_individuals_detected and date
spp_presence_30_day_avg_long_df$percent_individuals_detected = as.numeric(spp_presence_30_day_avg_long_df$percent_individuals_detected)

spp_presence_30_day_avg_long_df$day = as.numeric(levels(spp_presence_30_day_avg_long_df$day)[spp_presence_30_day_avg_long_df$day])


## Generate line plot
present_after_tagging_plot = ggplot(spp_presence_30_day_avg_long_df, mapping = aes(x = day, y = percent_individuals_detected, color = species)) + 
  geom_line() + 
  labs(x = 'Number of days', y = 'Proportion present') +
  ggsave(filename = 'Proportion of tags present after tagging.pdf', path = figure_directory)

print(present_after_tagging_plot)
```

# Statistical analysis

Calculate mean residency by spp (irregardless of time), then ANOVA by spp
Use Tukey's HSD to determine significance
```{r}
## ANOVA model for residency metric by species
residence_by_species_anova = aov(residence_metric ~ species, data=detection_stats)
summary(residence_by_species_anova)

## Tukey's Honestly Significant Differences between species
TukeyHSD(residence_by_species_anova)
```

GLM comparing residency time by spp independent var (time at liberty) dependent (residency index)
```{r}
## Fit binomial GLM to average residency metric data (proportional between 0-1)
species_glm = glm(residence_metric ~  species * days_at_liberty, data = detection_stats, family = binomial(logit))
summary(species_glm)
```
No Significant differences found 


## GLM comparing time in crater to vessel traffic
```{r}

```